Load the necessary libraries
library(car) #for regression diagnostics
library(broom) #for tidy output
library(ggfortify) #for model diagnostics
library(sjPlot) #for outputs
library(knitr) #for kable
library(effects) #for partial effects plots
library(emmeans) #for estimating marginal means
library(ggeffects) #for partial effects plots
library(modelr) #for auxillary modelling functions
library(DHARMa) #for residual diagnostics plots
library(performance) #for residuals diagnostics
library(see) #for plotting residuals
library(tidyverse) #for data wrangling
theme_set(theme_classic())
Polis et al. (1998) were intested in modelling the presence/absence of lizards (Uta sp.) against the perimeter to area ratio of 19 islands in the Gulf of California.
Uta lizard
Format of polis.csv data file
| island | ratio | pa |
|---|---|---|
| Bota | 15.41 | 1 |
| Cabeza | 5.63 | 1 |
| Cerraja | 25.92 | 1 |
| Coronadito | 15.17 | 0 |
| .. | .. | .. |
| island | Categorical listing of the name of the 19 islands used - variable not used in analysis. |
| ratio | Ratio of perimeter to area of the island. |
| pa | Presence (1) or absence (0) of Uta lizards on island. |
The aim of the analysis is to investigate the relationship between island parimeter to area ratio and the presence/absence of Uta lizards.
polis = read_csv('../data/polis.csv', trim_ws=TRUE)
## Parsed with column specification:
## cols(
## ISLAND = col_character(),
## RATIO = col_double(),
## PA = col_double()
## )
polis <- janitor::clean_names(polis)
glimpse(polis)
## Rows: 19
## Columns: 3
## $ island <chr> "Bota", "Cabeza", "Cerraja", "Coronadito", "Flecha", "Gemelose…
## $ ratio <dbl> 15.41, 5.63, 25.92, 15.17, 13.04, 18.85, 30.95, 22.87, 12.01, …
## $ pa <dbl> 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1
head(polis)
str(polis)
## tibble [19 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ island: chr [1:19] "Bota" "Cabeza" "Cerraja" "Coronadito" ...
## $ ratio : num [1:19] 15.41 5.63 25.92 15.17 13.04 ...
## $ pa : num [1:19] 1 1 1 0 1 0 0 0 0 1 ...
## - attr(*, "spec")=
## .. cols(
## .. ISLAND = col_character(),
## .. RATIO = col_double(),
## .. PA = col_double()
## .. )
Model formula: \[ y_i \sim{} \mathcal{Bin}(n, p_i)\\ ln\left(\frac{p_i}{1-p_i}\right) = \beta_0 + \beta_1 x_i \]
where \(y_i\) represents the \(i\) observed values, \(n\) represents the number of trials (in the case of logistic, this is always 1), \(p_i\) represents the probability of lizards being present in the \(i^{th}\) poluation, and \(\beta_0\) and \(\beta_1\) represent the intercept and slope respectively.
ggplot(polis, aes(y=pa, x=ratio))+
geom_point()
ggplot(polis, aes(y=pa, x=ratio))+
geom_point()+
geom_smooth(method='glm', formula=y~x,
method.args=list(family='binomial'))
polis.glm <- glm(pa ~ ratio, data = polis,
family = binomial(link = "logit"))
summary(polis.glm)
##
## Call:
## glm(formula = pa ~ ratio, family = binomial(link = "logit"),
## data = polis)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6067 -0.6382 0.2368 0.4332 2.0986
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.6061 1.6953 2.127 0.0334 *
## ratio -0.2196 0.1005 -2.184 0.0289 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 26.287 on 18 degrees of freedom
## Residual deviance: 14.221 on 17 degrees of freedom
## AIC: 18.221
##
## Number of Fisher Scoring iterations: 6
NOTE: making the probability a factor will mess up the emmeans backtransformation later on! Need to fit as numeric!
autoplot(polis.glm, which = 1:6, label.repel = T)
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
Residual plots impossible to distinguish any differences. However, QQ plot looks ok, so seems to conform well to a binomial distribution. Cook’s d is showing the third point to have high influence and leverage.
DHARMa::simulateResiduals(polis.glm, plot = T)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.8121925 0.8900391 0.9791159 0.3742036 0.985986 0.3423198 0.7653927 0.4519977 0.01965412 0.6090317 0.2424086 0.5740225 0.1126201 0.7365416 0.00219597 0.0698516 0.7393137 0.9747781 0.3383206
We can clearly see in the DHARMa package’s right plot that there are no issues with the residuals here.
Three different ways to plot the predicted effect:
plot_model(polis.glm, type = "eff", show.data = T)
## $ratio
plot(allEffects(polis.glm, residuals = T), type = "response")
polis.glm %>% ggemmeans(~ ratio) %>% plot(add.data=T)
summary(polis.glm)
##
## Call:
## glm(formula = pa ~ ratio, family = binomial(link = "logit"),
## data = polis)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6067 -0.6382 0.2368 0.4332 2.0986
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.6061 1.6953 2.127 0.0334 *
## ratio -0.2196 0.1005 -2.184 0.0289 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 26.287 on 18 degrees of freedom
## Residual deviance: 14.221 on 17 degrees of freedom
## AIC: 18.221
##
## Number of Fisher Scoring iterations: 6
There seems to be a negative relationship, whereas the ratio increases, the chance of the lizard occurring decreases!
To get the relationship, use exponential backtransform for log-link:
exp(coef(polis.glm)[1])
## (Intercept)
## 36.82103
Lizards are 36 times as likely to be present rather than absent given the x-var/ ratio = 0.
exp(coef(polis.glm)[2])
## ratio
## 0.8028734
For every 1 unit change in ratio, the odds of a lizard occurring are 0.80 of what they were previously (declines by 20% per unit).
tidy(polis.glm, conf.int = T)
LD50 = -intercept/slope
coef(polis.glm) %>% {-.[1]/.[2]} %>% as.numeric
## [1] 16.4242
Get Rasenbush pseudo-R^2:
1 - (polis.glm$deviance / polis.glm$null.deviance)
## [1] 0.4590197
Get another proposed pseudo-R^2, which is a more recent and more well accepted pseudo-R^2:
MuMIn::r.squaredLR(polis.glm)
## [1] 0.4700986
## attr(,"adj.r.squared")
## [1] 0.6273785
polis_grid <- polis %>% data_grid(ratio = seq_range(ratio, n=100))
newdata <- emmeans(polis.glm, ~ ratio,
at = polis_grid,
type = "response") %>% as.data.frame
newdata %>%
ggplot(aes( x=ratio)) +
geom_ribbon(aes(ymin=asymp.LCL, ymax=asymp.UCL),
fill='blue', alpha=0.2) +
geom_line(aes(y = prob)) +
geom_point(data = polis, aes(y = pa))